Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.


Step 1: Dataset Exploration

Visualize the German Traffic Signs Dataset. This is open ended, some suggestions include: plotting traffic signs images, plotting the count of each sign, etc. Be creative!

The pickled data is a dictionary with 4 key/value pairs:

  • features -> the images pixel values, (width, height, channels)
  • labels -> the label of the traffic sign
  • sizes -> the original width and height of the image, (width, height)
  • coords -> coordinates of a bounding box around the sign in the image, (x1, y1, x2, y2). Based the original image (not the resized version).
In [6]:
import cv2
import glob
import itertools
import math
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import random
import sklearn.model_selection
import tensorflow as tf
In [7]:
# Load pickled data
import pickle

# Fill this in based on where you saved the training and testing data
training_file = "data/train.p"
testing_file = "data/test.p"

with open(training_file, mode='rb') as f:
    train = pickle.load(f)
with open(testing_file, mode='rb') as f:
    test = pickle.load(f)
    
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
In [8]:
### To start off let's do a basic data summary.

# Number of training examples
n_train = len(y_train)

# Number of testing examples
n_test = len(y_test)

# What's the shape of an image?
image_shape = X_train[0].shape

# How many classes are in the dataset
n_classes = len(set(y_train))

print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
Number of training examples = 39209
Number of testing examples = 12630
Image data shape = (32, 32, 3)
Number of classes = 43
In [4]:
# Display one of each kind of image
def display_unique_classes(X, y,cmap=None):
    labels = set(y)
    for label in labels:
        idx = np.nonzero((y) == label)[0][2]
        image = X[idx,:,:]
        if cmap:
            plt.imshow(image,cmap)
        else:
            plt.imshow(image)
        plt.show()
        
display_unique_classes(X_train, y_train, cmap="gray")
In [5]:
# As it can be seen, there are not an equal number of examples in all classes
plt.hist(y_train,bins=43);
In [6]:
# Compute the max and min observations for any class
# We can see that some classes have as many as 2 thousand examples,
# while another class has as few as 200 examples

dist = np.histogram(y_train, bins=range(44))
classes = dist[1]
counts = dist[0]
max_count = max(counts)
min_count = min(counts)

print("Max observations for a class:", max_count)
print("Min observations for a class:", min_count)
Max observations for a class: 2250
Min observations for a class: 210

Step 2: Design and Test a Model Architecture

Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.

There are various aspects to consider when thinking about this problem:

  • Your model can be derived from a deep feedforward net or a deep convolutional network.
  • Play around preprocessing techniques (normalization, rgb to grayscale, etc)
  • Number of examples per label (some have more than others).
  • Generate fake data.

Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [11]:
def convert_image_to_gray(data):
    # Convert image to grayscale
    return np.mean(data, axis=3)
    
def preprocess_data(data):
     
    # Mean Subtraction to center the data at the origin
    data -= np.mean(data)
    
    # Normalization
    data /= np.std(data, axis = 0)
    
    # PCA and whitening?
    return data
In [13]:
# One hot encode labels
def encode_labels(labels):
    labels = (np.arange(n_classes) == labels[:,None]).astype(np.float)
    return labels
In [9]:
X_train_processed = preprocess_data(convert_image_to_gray(X_train))
X_test_processed = preprocess_data(convert_image_to_gray(X_test))
In [10]:
y_train_one_hot = encode_labels(y_train)
y_test_one_hot = encode_labels(y_test)
In [11]:
display_unique_classes(X_train_processed, y_train, cmap='gray')

Question 1

Describe the techniques used to preprocess the data.

Answer:

  • Converting the image to grayscale: This is done for two reasons:

    1. Simplicity: If we had three color channels in the input image, we would need to perform any operation on all the channels and then combine the results
    2. Data reduction: Processing only a single combined channel instead of three separate RGB channels reduces the amount of data to be processed by about 1/3 and allows the algorithm to run faster. However it comes at the cost of throwing away data (color data) that may be very helpful or required for many image processing applications.
  • One hot encoding the labels: This is done to make computing the cross entropy for the loss function possible

  • Mean Subtraction: This centers the cloud of data around the origin in every dimension.
  • Normalization: Normalize the data dimensions so that they are of the same scale. The way I have done this is by dividing each dimension by its standard deviation once it has been zero-centered. This is not strictly needed for images because the relative scales of pixels are already approximately equal. This process causes each feature to have a similar range so that our gradients don't go out of control (and that we only need one global learning rate multiplier).
  • Image reshaping - convolutions need the image data formatted as a cube (width by height by #channels)

Possible further preprocessing could include:

  1. PCA
  2. Whitening
In [12]:
### Generate additional data for classes that have a small number of examples

def translate_image(image, tx, ty):
    """
        This function will result in a translated image
        Translation = shifting of the object's location
        
        image: the image to be translated
        tx: shift in the x direction
        ty: shift in the y direction
    """
    # Define the transformation matrix
    M = np.float32([[1, 0, tx],[0, 1, ty]])
    rows,cols = image.shape
    
    return cv2.warpAffine(image, M, (cols,rows))
    

plt.subplot(121),plt.imshow(X_train_processed[0], cmap='gray'),plt.title('Input');
translated_image = translate_image(X_train_processed[0], -5, -5)
plt.subplot(122),plt.imshow(translated_image, cmap='gray'),plt.title('Output');
In [13]:
def rotate_image(image, angle):
    """
        This function rotates the given image through the given angle
        
        image: image to be rotated
        angle: angle through which the image is to be rotated
    """
    rows,cols = image.shape
    M = cv2.getRotationMatrix2D((cols/2,rows/2), angle, 1)
    
    return cv2.warpAffine(image, M, (cols,rows))

plt.subplot(121),plt.imshow(X_train_processed[0], cmap='gray'),plt.title('Input');
rotated_image = rotate_image(X_train_processed[0], 45);
plt.subplot(122),plt.imshow(rotated_image, cmap='gray'),plt.title('Output');
In [14]:
## Not sure if we want to use this as we get random images

def get_random_point(x_range, y_range):
    prop = 0.6
    x = random.randint(x_range - int(x_range * prop), x_range - int(x_range * (1-prop)))
    y = random.randint(y_range - int(y_range * prop), y_range - int(y_range * (1-prop)))
    return [x, y]

def get_n_random_points(n, x_range, y_range):
    return np.float32([get_random_point(x_range, y_range) for _ in range(n)])
    
def get_affine_transform(image):
    """
        In an affine transformation, 
        all parallel lines in the original image will
        stay parallel in the output image.
        
        We select three random points in the input image.
        And determine their position(randomly) in the output image
    """
    rows,cols = image.shape
    pts1 = get_n_random_points(3, rows, cols)
    pts2 = get_n_random_points(3, rows, cols)
    M = cv2.getAffineTransform(pts1, pts2)
    return cv2.warpAffine(image, M, (cols,rows))
    

plt.subplot(121),plt.imshow(X_train_processed[0], cmap='gray'),plt.title('Input');
affine_transformed_image = get_affine_transform(X_train_processed[0]);
plt.subplot(122),plt.imshow(affine_transformed_image, cmap='gray'),plt.title('Output');
In [15]:
def generate_training_data_map(X, y):
    """
        Given the two arrays X(array of input vectors) and Y(array of corresponding labels),
        generates a dictionary of the form:
        {
            label: [list of input vectors]
        }
        
    """
    training_data = {}
    classes = set(y)
    
    # Initialize the lists for each label
    for label in classes:
        training_data.setdefault(label, [])
        
    # Append to the list for each label
    for i,label in enumerate(y):
        training_data[label].append(X[i])
    
    return training_data
In [16]:
def generate_rotated_images(image, number_of_rotations):
    """
        image: seed image to generate more data
        number_of_rotations: number of images to be generated by rotating the seed image
    """
    degrees = 360
    degree_rotation_per_image = math.ceil(degrees/number_of_rotations)
    
    return [rotate_image(image, angle) for angle in range(0, 360, degree_rotation_per_image)]
In [17]:
def generate_translated_images(image, number_of_translations):
    """
        image: seed image to generate more data
        number_of_rotations: number of images to be generated by rotating the seed image
    """
    limit = 5
    pairs = list(itertools.combinations(range(-limit, limit), 2))[:number_of_translations]
    return [translate_image(image, pair[0], pair[1]) for pair in pairs]
        
In [18]:
def generate_additional_data(training_data_map):
    """
        For each label, this method calculates the number of examples that need to 
        be generated approximately and generates them by calling in some proportion
        the rotation and translation functions
    """
    proportion_rotation = 0.9
    # number of examples from each category to use to generate new data
    generator_size = 10

    for label in training_data_map:
        num_examples = len(training_data_map[label])
        num_to_generate = max_count - num_examples
        
        if num_to_generate > 0:
            # number of rotated images needed
            num_rotation = int(num_to_generate * proportion_rotation)
            # number of translated images needed
            num_translation = int(num_to_generate * (1 - proportion_rotation))
            
            
            num_rotation_per_example = math.ceil(num_rotation/generator_size)
            num_translation_per_example = math.ceil(num_translation/generator_size)
            
            # Iterate over the first ten examples in each category
            for i in range(generator_size):
                rotated_images = generate_rotated_images(training_data_map[label][i], num_rotation_per_example)
                training_data_map[label].extend(preprocess_data(rotated_images))
            
                translated_images = generate_translated_images(training_data_map[label][i], num_translation_per_example)
                training_data_map[label].extend(preprocess_data(translated_images))
In [19]:
def reformat_for_convolution(data, num_channels, image_shape):
    """
        Convolutions require the image formatted like a cube (with X height X num of channels)
    """
    return data.reshape(-1, image_shape[0], image_shape[1], num_channels).astype(np.float32)
In [20]:
def rebuild_training_data_from_map(training_data_map):
    X_train = []
    y_train = []
    
    for label in training_data_map:
        X_train.extend(training_data_map[label])
        y_train.extend([label] * len(training_data_map[label]))
    return X_train, np.array(y_train)
In [21]:
num_channels = 1

# # convert into an easy way to generate additional data
# training_data_map = generate_training_data_map(X_train_processed, y_train)

# # generate additional data
# generate_additional_data(training_data_map)

# # Convert back to the form X,y
# X_train_new, y_train_new = rebuild_training_data_from_map(training_data_map)

# # One hot encode the training labels
# y_train_new_one_hot = encode_labels(y_train_new)

# #  Convert to an nd_array
# X_train_new = np.ndarray(shape=(len(X_train_new), image_shape[0], image_shape[1]), buffer=np.array(X_train_new))

# Reformat the data to make it convolution-friendly
X_train_new = reformat_for_convolution(X_train_processed, num_channels, image_shape)
X_test_processed = reformat_for_convolution(X_test_processed, num_channels, image_shape)
In [22]:
# Rename variable for consistency
X_train = X_train_new
y_train = y_train
y_train_one_hot = y_train_one_hot

X_test = X_test_processed
y_test = y_test
y_test_one_hot = y_test_one_hot
In [23]:
# Training data
print(type(X_train), X_train.shape)
print(type(y_train), y_train.shape)
print(type(y_train_one_hot), y_train_one_hot.shape)

# Testing data
print(type(X_test), X_test.shape)
print(type(y_test), y_test.shape)
print(type(y_test_one_hot), y_test_one_hot.shape)
<class 'numpy.ndarray'> (39209, 32, 32, 1)
<class 'numpy.ndarray'> (39209,)
<class 'numpy.ndarray'> (39209, 43)
<class 'numpy.ndarray'> (12630, 32, 32, 1)
<class 'numpy.ndarray'> (12630,)
<class 'numpy.ndarray'> (12630, 43)
In [24]:
# All the classes now have roughly the same number of examples
plt.hist(y_train, bins=43);
In [25]:
## Make sure that the labels are correct. Picking a random image to verify
i = 39100
plt.imshow(X_train[i].reshape(32,32), cmap="gray");
print(y_train[i]);
42
In [26]:
### split the data into training/validation/testing sets here.
In [27]:
# proportion of the training data to be used for training
proportion_train = 0.7

count_train = int(len(X_train) * 0.7)

X_train, X_valid, y_train, y_valid = sklearn.model_selection.train_test_split(X_train, y_train, test_size=0.33)
y_train_one_hot = encode_labels(y_train)
y_valid_one_hot = encode_labels(y_valid)
In [28]:
# Ensure all sizes add up
print(len(X_train) + len(X_valid))
print(len(y_train) + len(y_valid))
print(len(y_train_one_hot) + len(y_valid_one_hot))

plt.imshow(X_valid[101].reshape(32, 32), cmap='gray');
y_valid[101]
39209
39209
39209
Out[28]:
18

Question 2

Describe how you set up the training, validation and testing data for your model. If you generated additional data, why?

Answer:

Training, Validation and Testing data sets
  • A test data set is already provided
  • For the purpose of hyperparameter tuning, the training data set is split into two parts - one for training and one for validation.
  • This split is doing using a simple division of the data using a proportion using the scikitlearn function model_selection.train_test_split.
Generating additional data
  • As can be seen from the histogram and the min and the max computed below that, some classes have as many as 2000 examples while another class has as few as 200 examples.
  • This will cause the classifier to be biased towards the classes that have a larger number of examples
  • To prevent this, we generate additional test data for classes which have a smaller number of examples
  • There are three ways in which I am generating additional data:
  • Translations: Shifting the image in the x and y directions
  • Rotation of the image through various angles
  • Affine transformations of the image

Question 3

What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.

Answer:

Architecture Diagram Architecture Diagram

Details

Number of convolutional layers - 2

Number of filters in layer 1 - 16

Number of filters in layer 2 - 36

Number of fully connected layers - 2

Size of fully connected layer 1 - Number of features from the conv layer

Size of fully connected layer 2 - 1024

Dropout % - Not applied yet - future improvement.

Regularization applied to fully connected layer weights - 1e-4

Learning rate decay applied - 1e-4

In [29]:
# Dimensions of a flattened image
image_size_flat = image_shape[0] * image_shape[1]

# Number of input channels(1 for grayscale)
n_channels = 1

# Hyperparameters for the first conv layer
layer_1_filter_size = 5
layer_1_no_filters = 16

# Hyperparameter for the second conv layer
layer_2_filter_size = 5
layer_2_no_filters = 36

# Size of the fully connected layer
fully_conn_size = 128

# Batch size used during training
train_batch_size = 128

# Keep prob used during drop-outs
keep_prob = 0.6

# Regularization
beta = 1e-4
In [30]:
# input variables to the network
x = tf.placeholder(tf.float32, shape=[None, image_shape[0], image_shape[1], n_channels], name='x')

y_true = tf.placeholder(tf.float32, shape=[None, n_classes], name='y_true')
y_true_cls = tf.argmax(y_true, dimension=1)


x_valid = tf.constant(X_valid)
x_test = tf.constant(X_test)
x_new = tf.placeholder(tf.float32, shape=[None, image_shape[0], image_shape[1], n_channels], name='x')

# Optimizer: set up a variable that's incremented once per batch and
# controls the learning rate decay.
batch = tf.Variable(0)

# Decay once per epoch, using an exponential schedule starting at 0.01.
learning_rate = tf.train.exponential_decay(
      1e-4,                # Base learning rate.
      batch * train_batch_size,  # Current index into the dataset.
      X_train.shape[0],          # Decay step.
      0.95,                # Decay rate.
      staircase=True)
In [31]:
def get_weights(shape):
    """
        Return a tensorflow variable
        which represents a weight matrix of the given shape
        initialized with weights randomly chosen from a gaussian distribution
        with mean 0 and a small standard deviation
    """
    return tf.Variable(tf.truncated_normal(shape, stddev=0.05))
In [32]:
def get_biases(shape):
    """
        Return a tensorflow variable
        which represents a bias vectos initialized with all zeros
    """
    return tf.Variable(tf.constant(0.05, shape=shape))
In [33]:
def get_conv_layer(input, weights, biases, stride=1, use_max_pooling=True):
    """
        Return a node in the tensorflow graph which represents a convolutional layer
        input: input to this layer
        no_input_channels: number of input channels to this layer
        filter_size: the dimension of the square filter
        no_output_channels: number of output channels from this layer
        if use_max_pooling is true, a max pooling is applied to the output of the convolutional layer
        if add_relu is true, a relu operation is added to the output
    """
    layer_output = tf.nn.conv2d(input=input,
                               filter=weights,
                               strides=[1, stride, stride, 1],
                               padding="SAME")
    layer_output += biases
    
    if(use_max_pooling):
        layer_output = tf.nn.max_pool(value=layer_output,
                      ksize=[1, 2, 2, 1],
                      strides=[1, 2, 2, 1],
                      padding="SAME")
    
    layer_output = tf.nn.relu(layer_output)
    
    return layer_output
In [34]:
def convert_conv_output_to_fc_input(layer):
    """
        Takes as input the output from a convolutional layer
        And flattens it to be used as input to a fully connected later
    """
    layer_shape = layer.get_shape()
    
    num_features = layer_shape[1:4].num_elements()
    
    layer_flat = tf.reshape(layer, [-1, num_features])
    
    return layer_flat, num_features
In [35]:
def get_fully_connected_layer(input_layer, weights, biases, use_RELU=True, train=False):
    output = tf.matmul(input_layer, weights) + biases
    
    if(train):
        output = tf.nn.dropout(keep_prob=keep_prob)
        
    if(use_RELU):
        output = tf.nn.relu(output)
    
    return output
    
    
In [36]:
weights_conv_layer_1 = get_weights([layer_1_filter_size, layer_1_filter_size, n_channels, layer_1_no_filters])
biases_conv_layer_1 = get_biases([layer_1_no_filters])

weights_conv_layer_2 = get_weights([layer_2_filter_size, layer_2_filter_size, layer_1_no_filters, layer_2_no_filters])
biases_conv_layer_2 = get_biases([layer_2_no_filters])

flattened_layer_size = int(image_shape[0]//4) * int(image_shape[0]//4) * layer_2_no_filters
weights_fully_conn_layer_1 = get_weights([flattened_layer_size, fully_conn_size])
biases_fully_conn_layer_1 = get_biases([fully_conn_size])

weights_fully_conn_layer_2 = get_weights([fully_conn_size, n_classes])
biases_fully_conn_layer_2 = get_biases([n_classes])


def model(data, train=False):
    """
        This is where the network architecture is defined
    """
#     layer_1_output, weights_layer1 = get_conv_layer(data, n_channels, layer_1_no_filters, layer_1_filter_size)
    
#     layer_2_output, weights_layer2 = get_conv_layer(layer_1_output, layer_1_no_filters, layer_2_no_filters, layer_2_filter_size)
    
#     layer_flat, num_features = convert_conv_output_to_fc_input(layer_2_output)
    
#     fully_connected_layer_1_output = get_fully_connected_layer(layer_flat, num_features, fully_conn_size)
    
#     fully_connected_layer_2_output = get_fully_connected_layer(fully_connected_layer_1_output, fully_conn_size, n_classes, use_RELU=False)
    
#     return fully_connected_layer_2_output
    # input, weights, biases, stride=1, use_max_pooling=True
    layer_1_output = get_conv_layer(data, weights_conv_layer_1, biases_conv_layer_1)
    
    layer_2_output = get_conv_layer(layer_1_output, weights_conv_layer_2, biases_conv_layer_2)
    
    layer_flat,num_features = convert_conv_output_to_fc_input(layer_2_output)
    
    fully_connected_layer_1_output = get_fully_connected_layer(layer_flat, weights_fully_conn_layer_1, biases_fully_conn_layer_1, train)
    
    fully_connected_layer_2_output = get_fully_connected_layer(fully_connected_layer_1_output, weights_fully_conn_layer_2, biases_fully_conn_layer_2, use_RELU=False, train=train)
    
    return fully_connected_layer_2_output
    
In [37]:
network_output = model(x)


# Loss and optimizer definitions
cross_entropy = (-tf.reduce_sum(y_true * tf.log(tf.clip_by_value(tf.nn.softmax(network_output),1e-10,1.0))))/train_batch_size
#cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=network_output,
#                                                        labels=y_true)
cost = tf.reduce_mean(cross_entropy) + 
beta *(tf.nn.l2_loss(weights_conv_layer_1) + tf.nn.l2_loss(weights_conv_layer_2) + tf.nn.l2_loss(weights_fully_conn_layer_1) + tf.nn.l2_loss(weights_fully_conn_layer_2))

optimizer = tf.train.GradientDescentOptimizer(1e-4).minimize(cost)


# Predictions and accuracy
y_pred = tf.nn.softmax(network_output)

valid_pred = tf.nn.softmax(model(x_valid))
test_pred = tf.nn.softmax(model(x_test))
new_pred = tf.nn.softmax(model(x_new))

Training the model

In [38]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])
In [39]:
def data_iterator(x, y):
    """ A simple data iterator """
    while True:
        # shuffle labels and features
        idxs = np.arange(0, len(x))
        np.random.shuffle(idxs)
        shuf_features = x[idxs]
        shuf_labels = y[idxs]
        batch_size = train_batch_size
        for batch_idx in range(0, len(x), batch_size):
            images_batch = shuf_features[batch_idx:batch_idx+batch_size]
            labels_batch = shuf_labels[batch_idx:batch_idx+batch_size]
            yield images_batch, labels_batch
            
In [40]:
num_iterations = 200001

session = tf.Session()
session.run(tf.global_variables_initializer())

iter_ = data_iterator(X_train, y_train_one_hot)

for i in range(num_iterations):
    # get a batch of data

    x_batch, y_true_batch_one_hot = next(iter_)

#     X_train_subset, _, y_train_subset, _ = sklearn.model_selection.train_test_split(X_train, y_train, train_size=train_batch_size)
#     y_train_subset_one_hot = encode_labels(y_train_subset)
    
    a = {x: x_batch, y_true: y_true_batch_one_hot}
    _, l, predictions = session.run([optimizer, cost, y_pred], feed_dict=a)
    

    if i % 1000 == 0:
        print('Loss at step %d: %f' % (i, l))
        
        print('Training accuracy: %.1f%%' % accuracy(predictions, y_true_batch_one_hot))
        
print('Validation accuracy: %.1f%%' % accuracy(valid_pred.eval(session=session), y_valid_one_hot))        
print('Test accuracy: %.1f%%' % accuracy(test_pred.eval(session=session), y_test_one_hot))
Loss at step 0: 3.771504
Training accuracy: 3.9%
Loss at step 1000: 3.743470
Training accuracy: 4.7%
Loss at step 2000: 3.711973
Training accuracy: 3.1%
Loss at step 3000: 3.650480
Training accuracy: 3.9%
Loss at step 4000: 3.590197
Training accuracy: 5.5%
Loss at step 5000: 3.591193
Training accuracy: 3.9%
Loss at step 6000: 3.556942
Training accuracy: 5.5%
Loss at step 7000: 3.456553
Training accuracy: 7.0%
Loss at step 8000: 3.371591
Training accuracy: 11.7%
Loss at step 9000: 3.386131
Training accuracy: 10.2%
Loss at step 10000: 3.422920
Training accuracy: 5.5%
Loss at step 11000: 3.602076
Training accuracy: 4.7%
Loss at step 12000: 3.386658
Training accuracy: 15.6%
Loss at step 13000: 3.420935
Training accuracy: 6.2%
Loss at step 14000: 3.422586
Training accuracy: 10.2%
Loss at step 15000: 3.294561
Training accuracy: 14.8%
Loss at step 16000: 3.262895
Training accuracy: 16.4%
Loss at step 17000: 3.304788
Training accuracy: 14.1%
Loss at step 18000: 3.180651
Training accuracy: 12.5%
Loss at step 19000: 3.241109
Training accuracy: 9.4%
Loss at step 20000: 3.283566
Training accuracy: 14.8%
Loss at step 21000: 3.427028
Training accuracy: 9.4%
Loss at step 22000: 3.290747
Training accuracy: 9.4%
Loss at step 23000: 3.253706
Training accuracy: 11.7%
Loss at step 24000: 3.401939
Training accuracy: 15.6%
Loss at step 25000: 3.551957
Training accuracy: 7.8%
Loss at step 26000: 3.315591
Training accuracy: 12.5%
Loss at step 27000: 3.108546
Training accuracy: 14.1%
Loss at step 28000: 3.300163
Training accuracy: 18.8%
Loss at step 29000: 3.216027
Training accuracy: 14.8%
Loss at step 30000: 3.203241
Training accuracy: 19.5%
Loss at step 31000: 3.176843
Training accuracy: 18.8%
Loss at step 32000: 3.255190
Training accuracy: 11.7%
Loss at step 33000: 3.135685
Training accuracy: 17.2%
Loss at step 34000: 3.065526
Training accuracy: 18.0%
Loss at step 35000: 3.186663
Training accuracy: 19.5%
Loss at step 36000: 2.970869
Training accuracy: 22.7%
Loss at step 37000: 3.171920
Training accuracy: 17.2%
Loss at step 38000: 2.985327
Training accuracy: 24.2%
Loss at step 39000: 3.113728
Training accuracy: 18.0%
Loss at step 40000: 2.972098
Training accuracy: 19.5%
Loss at step 41000: 3.002759
Training accuracy: 20.3%
Loss at step 42000: 3.074784
Training accuracy: 19.5%
Loss at step 43000: 3.024201
Training accuracy: 18.8%
Loss at step 44000: 2.912766
Training accuracy: 20.3%
Loss at step 45000: 2.969297
Training accuracy: 22.7%
Loss at step 46000: 2.929788
Training accuracy: 26.6%
Loss at step 47000: 2.837561
Training accuracy: 25.8%
Loss at step 48000: 2.962762
Training accuracy: 26.6%
Loss at step 49000: 2.787757
Training accuracy: 28.1%
Loss at step 50000: 2.928507
Training accuracy: 23.4%
Loss at step 51000: 2.712633
Training accuracy: 31.2%
Loss at step 52000: 2.539919
Training accuracy: 35.2%
Loss at step 53000: 2.577975
Training accuracy: 37.5%
Loss at step 54000: 2.669951
Training accuracy: 30.5%
Loss at step 55000: 2.398015
Training accuracy: 38.3%
Loss at step 56000: 2.461705
Training accuracy: 37.5%
Loss at step 57000: 2.437802
Training accuracy: 39.8%
Loss at step 58000: 2.380522
Training accuracy: 39.1%
Loss at step 59000: 2.236767
Training accuracy: 40.6%
Loss at step 60000: 2.329784
Training accuracy: 39.1%
Loss at step 61000: 2.263040
Training accuracy: 46.9%
Loss at step 62000: 2.382339
Training accuracy: 40.6%
Loss at step 63000: 2.160653
Training accuracy: 44.5%
Loss at step 64000: 2.141690
Training accuracy: 41.4%
Loss at step 65000: 2.164782
Training accuracy: 48.4%
Loss at step 66000: 2.221267
Training accuracy: 43.0%
Loss at step 67000: 2.127367
Training accuracy: 46.1%
Loss at step 68000: 2.135086
Training accuracy: 40.6%
Loss at step 69000: 1.946629
Training accuracy: 56.2%
Loss at step 70000: 2.003807
Training accuracy: 50.0%
Loss at step 71000: 1.762445
Training accuracy: 60.9%
Loss at step 72000: 1.800321
Training accuracy: 61.7%
Loss at step 73000: 2.011168
Training accuracy: 43.0%
Loss at step 74000: 1.630602
Training accuracy: 59.4%
Loss at step 75000: 1.846381
Training accuracy: 52.3%
Loss at step 76000: 1.861127
Training accuracy: 52.3%
Loss at step 77000: 1.700985
Training accuracy: 60.2%
Loss at step 78000: 1.727242
Training accuracy: 57.0%
Loss at step 79000: 1.664964
Training accuracy: 56.2%
Loss at step 80000: 1.581879
Training accuracy: 64.1%
Loss at step 81000: 1.504525
Training accuracy: 66.4%
Loss at step 82000: 1.434348
Training accuracy: 63.3%
Loss at step 83000: 1.321012
Training accuracy: 64.8%
Loss at step 84000: 1.343982
Training accuracy: 68.0%
Loss at step 85000: 1.561980
Training accuracy: 57.8%
Loss at step 86000: 1.338420
Training accuracy: 65.6%
Loss at step 87000: 1.314224
Training accuracy: 69.5%
Loss at step 88000: 1.441452
Training accuracy: 65.6%
Loss at step 89000: 1.295169
Training accuracy: 71.1%
Loss at step 90000: 1.316637
Training accuracy: 65.6%
Loss at step 91000: 1.220713
Training accuracy: 70.3%
Loss at step 92000: 1.311116
Training accuracy: 67.2%
Loss at step 93000: 1.210315
Training accuracy: 69.5%
Loss at step 94000: 1.159960
Training accuracy: 75.0%
Loss at step 95000: 1.058022
Training accuracy: 75.0%
Loss at step 96000: 1.097482
Training accuracy: 69.5%
Loss at step 97000: 1.097732
Training accuracy: 72.7%
Loss at step 98000: 0.978598
Training accuracy: 75.8%
Loss at step 99000: 1.114833
Training accuracy: 68.0%
Loss at step 100000: 1.115949
Training accuracy: 73.4%
Loss at step 101000: 1.275397
Training accuracy: 70.3%
Loss at step 102000: 1.024987
Training accuracy: 72.7%
Loss at step 103000: 1.117919
Training accuracy: 74.2%
Loss at step 104000: 1.088924
Training accuracy: 77.3%
Loss at step 105000: 0.969212
Training accuracy: 76.6%
Loss at step 106000: 1.124671
Training accuracy: 78.1%
Loss at step 107000: 0.929699
Training accuracy: 77.3%
Loss at step 108000: 0.930072
Training accuracy: 78.9%
Loss at step 109000: 0.954645
Training accuracy: 75.0%
Loss at step 110000: 0.935906
Training accuracy: 77.3%
Loss at step 111000: 0.977297
Training accuracy: 76.6%
Loss at step 112000: 1.011257
Training accuracy: 74.2%
Loss at step 113000: 0.858773
Training accuracy: 78.1%
Loss at step 114000: 0.838542
Training accuracy: 76.6%
Loss at step 115000: 0.837213
Training accuracy: 80.5%
Loss at step 116000: 0.881071
Training accuracy: 74.2%
Loss at step 117000: 0.762104
Training accuracy: 82.0%
Loss at step 118000: 0.865497
Training accuracy: 78.9%
Loss at step 119000: 0.823615
Training accuracy: 75.0%
Loss at step 120000: 0.795547
Training accuracy: 78.9%
Loss at step 121000: 0.833043
Training accuracy: 76.6%
Loss at step 122000: 0.667907
Training accuracy: 86.7%
Loss at step 123000: 0.766074
Training accuracy: 81.2%
Loss at step 124000: 0.655685
Training accuracy: 81.2%
Loss at step 125000: 0.727664
Training accuracy: 82.0%
Loss at step 126000: 0.728320
Training accuracy: 80.5%
Loss at step 127000: 0.664155
Training accuracy: 84.4%
Loss at step 128000: 0.691339
Training accuracy: 84.4%
Loss at step 129000: 0.669269
Training accuracy: 84.4%
Loss at step 130000: 0.686481
Training accuracy: 82.8%
Loss at step 131000: 0.623436
Training accuracy: 89.1%
Loss at step 132000: 0.739677
Training accuracy: 82.0%
Loss at step 133000: 0.908718
Training accuracy: 75.8%
Loss at step 134000: 0.829097
Training accuracy: 79.7%
Loss at step 135000: 0.706142
Training accuracy: 84.4%
Loss at step 136000: 0.835590
Training accuracy: 79.7%
Loss at step 137000: 0.752169
Training accuracy: 74.2%
Loss at step 138000: 0.632196
Training accuracy: 84.4%
Loss at step 139000: 0.551515
Training accuracy: 89.1%
Loss at step 140000: 0.645098
Training accuracy: 85.9%
Loss at step 141000: 0.487618
Training accuracy: 87.5%
Loss at step 142000: 0.528906
Training accuracy: 84.4%
Loss at step 143000: 0.597713
Training accuracy: 87.5%
Loss at step 144000: 0.677922
Training accuracy: 82.0%
Loss at step 145000: 0.516813
Training accuracy: 88.3%
Loss at step 146000: 0.636836
Training accuracy: 85.2%
Loss at step 147000: 0.535574
Training accuracy: 86.7%
Loss at step 148000: 0.503589
Training accuracy: 84.4%
Loss at step 149000: 0.660482
Training accuracy: 82.8%
Loss at step 150000: 0.530113
Training accuracy: 86.7%
Loss at step 151000: 0.496865
Training accuracy: 88.3%
Loss at step 152000: 0.513527
Training accuracy: 87.5%
Loss at step 153000: 0.451128
Training accuracy: 90.6%
Loss at step 154000: 0.459571
Training accuracy: 86.7%
Loss at step 155000: 0.500050
Training accuracy: 88.3%
Loss at step 156000: 0.582981
Training accuracy: 83.6%
Loss at step 157000: 0.406116
Training accuracy: 85.9%
Loss at step 158000: 0.608467
Training accuracy: 84.4%
Loss at step 159000: 0.672142
Training accuracy: 83.6%
Loss at step 160000: 0.559356
Training accuracy: 84.4%
Loss at step 161000: 0.533186
Training accuracy: 87.5%
Loss at step 162000: 0.568446
Training accuracy: 85.9%
Loss at step 163000: 0.587841
Training accuracy: 84.4%
Loss at step 164000: 0.426354
Training accuracy: 90.6%
Loss at step 165000: 0.426532
Training accuracy: 88.3%
Loss at step 166000: 0.448570
Training accuracy: 88.3%
Loss at step 167000: 0.484165
Training accuracy: 87.5%
Loss at step 168000: 0.592392
Training accuracy: 83.6%
Loss at step 169000: 0.369926
Training accuracy: 92.2%
Loss at step 170000: 0.431233
Training accuracy: 89.8%
Loss at step 171000: 0.416457
Training accuracy: 89.8%
Loss at step 172000: 0.374811
Training accuracy: 95.3%
Loss at step 173000: 0.388932
Training accuracy: 89.1%
Loss at step 174000: 0.341826
Training accuracy: 93.0%
Loss at step 175000: 0.514162
Training accuracy: 87.5%
Loss at step 176000: 0.364004
Training accuracy: 93.0%
Loss at step 177000: 0.340601
Training accuracy: 94.5%
Loss at step 178000: 0.379765
Training accuracy: 90.6%
Loss at step 179000: 0.540882
Training accuracy: 87.5%
Loss at step 180000: 0.459350
Training accuracy: 91.4%
Loss at step 181000: 0.340942
Training accuracy: 91.4%
Loss at step 182000: 0.482486
Training accuracy: 90.6%
Loss at step 183000: 0.426131
Training accuracy: 88.3%
Loss at step 184000: 0.351742
Training accuracy: 92.2%
Loss at step 185000: 0.395431
Training accuracy: 90.6%
Loss at step 186000: 0.449742
Training accuracy: 85.2%
Loss at step 187000: 0.420181
Training accuracy: 89.1%
Loss at step 188000: 0.334274
Training accuracy: 93.8%
Loss at step 189000: 0.422025
Training accuracy: 90.6%
Loss at step 190000: 0.361705
Training accuracy: 89.1%
Loss at step 191000: 0.355958
Training accuracy: 91.4%
Loss at step 192000: 0.262469
Training accuracy: 94.5%
Loss at step 193000: 0.477138
Training accuracy: 83.6%
Loss at step 194000: 0.377954
Training accuracy: 89.8%
Loss at step 195000: 0.445793
Training accuracy: 89.8%
Loss at step 196000: 0.311710
Training accuracy: 93.8%
Loss at step 197000: 0.406381
Training accuracy: 92.2%
Loss at step 198000: 0.341434
Training accuracy: 93.8%
Loss at step 199000: 0.450080
Training accuracy: 91.4%
Loss at step 200000: 0.490404
Training accuracy: 87.5%
Loss at step 201000: 0.378814
Training accuracy: 89.8%
Loss at step 202000: 0.393539
Training accuracy: 92.2%
Loss at step 203000: 0.323927
Training accuracy: 93.8%
Loss at step 204000: 0.408243
Training accuracy: 89.8%
Loss at step 205000: 0.290365
Training accuracy: 93.0%
Loss at step 206000: 0.306655
Training accuracy: 92.2%
Loss at step 207000: 0.232419
Training accuracy: 95.3%
Loss at step 208000: 0.369271
Training accuracy: 89.1%
Loss at step 209000: 0.340126
Training accuracy: 89.8%
Loss at step 210000: 0.286325
Training accuracy: 94.5%
Loss at step 211000: 0.354161
Training accuracy: 95.3%
Loss at step 212000: 0.419936
Training accuracy: 90.6%
Loss at step 213000: 0.311807
Training accuracy: 94.5%
Loss at step 214000: 0.339940
Training accuracy: 93.0%
Loss at step 215000: 0.326072
Training accuracy: 92.2%
Loss at step 216000: 0.344894
Training accuracy: 92.2%
Loss at step 217000: 0.311537
Training accuracy: 91.4%
Loss at step 218000: 0.343751
Training accuracy: 92.2%
Loss at step 219000: 0.382925
Training accuracy: 92.2%
Loss at step 220000: 0.291175
Training accuracy: 91.4%
Loss at step 221000: 0.309963
Training accuracy: 92.2%
Loss at step 222000: 0.290452
Training accuracy: 92.2%
Loss at step 223000: 0.332387
Training accuracy: 90.6%
Loss at step 224000: 0.324135
Training accuracy: 90.6%
Loss at step 225000: 0.273236
Training accuracy: 93.8%
Loss at step 226000: 0.247409
Training accuracy: 95.3%
Loss at step 227000: 0.265748
Training accuracy: 95.3%
Loss at step 228000: 0.222246
Training accuracy: 94.5%
Loss at step 229000: 0.220535
Training accuracy: 98.4%
Loss at step 230000: 0.204520
Training accuracy: 93.0%
Loss at step 231000: 0.264080
Training accuracy: 93.0%
Loss at step 232000: 0.296978
Training accuracy: 96.1%
Loss at step 233000: 0.312251
Training accuracy: 94.5%
Loss at step 234000: 0.408664
Training accuracy: 88.3%
Loss at step 235000: 0.209144
Training accuracy: 96.9%
Loss at step 236000: 0.248860
Training accuracy: 94.5%
Loss at step 237000: 0.296215
Training accuracy: 90.6%
Loss at step 238000: 0.243151
Training accuracy: 96.1%
Loss at step 239000: 0.400381
Training accuracy: 90.6%
Loss at step 240000: 0.201336
Training accuracy: 96.1%
Loss at step 241000: 0.194587
Training accuracy: 93.0%
Loss at step 242000: 0.258829
Training accuracy: 95.3%
Loss at step 243000: 0.248612
Training accuracy: 93.8%
Loss at step 244000: 0.198289
Training accuracy: 93.0%
Loss at step 245000: 0.331902
Training accuracy: 92.2%
Loss at step 246000: 0.320979
Training accuracy: 93.8%
Loss at step 247000: 0.231998
Training accuracy: 93.0%
Loss at step 248000: 0.327963
Training accuracy: 90.6%
Loss at step 249000: 0.158522
Training accuracy: 97.7%
Loss at step 250000: 0.208168
Training accuracy: 95.3%
Loss at step 251000: 0.310343
Training accuracy: 92.2%
Loss at step 252000: 0.158749
Training accuracy: 96.1%
Loss at step 253000: 0.148018
Training accuracy: 98.4%
Loss at step 254000: 0.219941
Training accuracy: 95.3%
Loss at step 255000: 0.302086
Training accuracy: 95.3%
Loss at step 256000: 0.215041
Training accuracy: 96.9%
Loss at step 257000: 0.240649
Training accuracy: 93.0%
Loss at step 258000: 0.157194
Training accuracy: 97.7%
Loss at step 259000: 0.212530
Training accuracy: 94.5%
Loss at step 260000: 0.207533
Training accuracy: 96.1%
Loss at step 261000: 0.360017
Training accuracy: 94.5%
Loss at step 262000: 0.255351
Training accuracy: 89.8%
Loss at step 263000: 0.217106
Training accuracy: 93.8%
Loss at step 264000: 0.183969
Training accuracy: 95.3%
Loss at step 265000: 0.205388
Training accuracy: 95.3%
Loss at step 266000: 0.238713
Training accuracy: 93.8%
Loss at step 267000: 0.304878
Training accuracy: 91.4%
Loss at step 268000: 0.404215
Training accuracy: 92.2%
Loss at step 269000: 0.291132
Training accuracy: 93.8%
Loss at step 270000: 0.282226
Training accuracy: 94.5%
Loss at step 271000: 0.128138
Training accuracy: 96.1%
Loss at step 272000: 0.470057
Training accuracy: 92.2%
Loss at step 273000: 0.257527
Training accuracy: 97.7%
Loss at step 274000: 0.212960
Training accuracy: 95.3%
Loss at step 275000: 0.184861
Training accuracy: 99.2%
Loss at step 276000: 0.230732
Training accuracy: 93.0%
Loss at step 277000: 0.193597
Training accuracy: 96.9%
Loss at step 278000: 0.148437
Training accuracy: 96.1%
Loss at step 279000: 0.124226
Training accuracy: 98.4%
Loss at step 280000: 0.170125
Training accuracy: 96.9%
Loss at step 281000: 0.253918
Training accuracy: 93.8%
Loss at step 282000: 0.352331
Training accuracy: 93.8%
Loss at step 283000: 0.149160
Training accuracy: 97.7%
Loss at step 284000: 0.202183
Training accuracy: 95.3%
Loss at step 285000: 0.202349
Training accuracy: 96.1%
Loss at step 286000: 0.114281
Training accuracy: 96.9%
Loss at step 287000: 0.166356
Training accuracy: 96.1%
Loss at step 288000: 0.189897
Training accuracy: 95.3%
Loss at step 289000: 0.239972
Training accuracy: 94.5%
Loss at step 290000: 0.403845
Training accuracy: 92.2%
Loss at step 291000: 0.137966
Training accuracy: 96.1%
Loss at step 292000: 0.190595
Training accuracy: 95.3%
Loss at step 293000: 0.165580
Training accuracy: 96.1%
Loss at step 294000: 0.198876
Training accuracy: 95.3%
Loss at step 295000: 0.147568
Training accuracy: 96.1%
Loss at step 296000: 0.219383
Training accuracy: 93.0%
Loss at step 297000: 0.189284
Training accuracy: 96.1%
Loss at step 298000: 0.175552
Training accuracy: 96.9%
Loss at step 299000: 0.170526
Training accuracy: 95.3%
Loss at step 300000: 0.151924
Training accuracy: 96.1%
Validation accuracy: 92.9%
Test accuracy: 81.6%
In [47]:
oSaver = tf.train.Saver()
oSaver.save(session, "30001-new%.ckpt") 
Out[47]:
'30001-new%.ckpt'

Question 4

How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)

Answer:

I trained the optimizer as follows:

Hyper-parameter Value Reason
Optimizer Type Gradient Descent Optimizer I started with a simple one but will try out others like Momentum
Batch Size train_batch_size=128 Does not cause memory issues and provides about 82% test accuracy
Number of epochs About 3 Total of 30,000 iterations through the training data in batches of 128
Learning Rate 1e-4 Initially I was using a large learning rate that overshot the minima several times
Regularization Term 1e-4 Prevents over-fitting
Dropout Currently not used Will use in the future to reduce over-fitting

Question 5

What approach did you take in coming up with a solution to this problem?

Answer:

  1. I first tried to solve this problem using what I already knew(fully connected neural network with dropouts, regularization, learning rate decay) and tuned the hyperparameters. The best accuracy I could get was around 75%
  2. I then tried a simple convolutional neural network with two conv layers and started tuning the hyperparameters. The best accuracy I could get was 82%
  3. I then started looking up further on how to optimize the conv net by reading some articles like this and this
  4. I wasn't able to implement all these techniques as the model takes a really long time to run on my machine. But I will continue to experiment further.

Step 3: Test a Model on New Images

Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.

You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [14]:
### Load the images and plot them here.
### Feel free to use as many code cells as needed.

def display_images(images, cmap=None, num_rows=3):
    """
        Displays an image array a grid with number of rows=num_rows
    """
    num_images = len(images)
    num_cols = int(num_images/num_rows)
    fig, axes = plt.subplots(num_rows, num_cols)
    
    for i, ax in enumerate(axes.flat):
        # Only plot the valid images
        if i < num_images:
            img = images[i]

            # Plot image.
            if(cmap):
                ax.imshow(img, cmap=cmap)
            else:
                ax.imshow(img)

def read_images(fileNames, display=True):
    images = []
    for file in fileNames:
        image = plt.imread(file)
        image = image.astype(np.float32)
        images.append(image)
    return images    
        

def process_images(images):
    processed_images = []
    for image in images:
        # Resize image
        image = cv2.resize(image, (image_shape[0], image_shape[1]))

        # convert image to grayscale
        image = np.mean(image, axis=2)
    
        # Normalize data
        image = preprocess_data(image)
        
        processed_images.append(image)

    return np.array(processed_images)
    

test_files = [file for file in glob.glob("collected_test_data/*")]


images = read_images(test_files)
display_images(images)

images = process_images(images)
display_images(images, cmap='gray')
images = np.ndarray((len(images), image_shape[0], image_shape[1], 1), buffer=images, dtype=np.float32)

x_collected = images
y_collected_true = np.array([16, 2, 4, 38, 34, 13])
y_collected_true_one_hot = encode_labels(y_collected_true)

Question 6

Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It would be helpful to plot the images in the notebook.

Answer:

I've chosen 5 candidate images. The thing that makes them difficult to classify is that their dimensions are usually a rectangle. So, when I resize it to 32x32, the image is a bit squashed and the neural net may find it difficult to classify. Additionally, one of the images shows a background which is not present in any of the training examples

In [44]:
### Run the predictions here.
### Feel free to use as many code cells as needed.
test_preds = session.run(new_pred, feed_dict={x_new: x_collected, y_true: y_collected_true_one_hot})

test_accuracy = accuracy(test_preds, y_collected_true_one_hot)
print("Test Accuracy: {0:>6}".format(test_accuracy))
Test Accuracy: 16.666666666666668

Question 7

Is your model able to perform equally well on captured pictures or a live camera stream when compared to testing on the dataset?

  • The model does not perform well on these test images due to reason mentioned above
  • Original test accuracy on the test data set provided was about 82%
  • However, I only obtain a 16% accuracy for this sample of 5

Answer:

In [45]:
import pandas as pd

sign_names = pd.read_csv('signnames.csv')

def get_sign_name(class_index):
    row = sign_names[sign_names.ClassId == int(class_index)]
    return row.SignName.values[0]

def get_sign_names(indices):
    return [get_sign_name(index) for index in indices]
In [46]:
### Visualize the softmax probabilities here.
### Feel free to use as many code cells as needed.

import pandas as pd
output = session.run(y_pred, feed_dict={x: x_collected, y_true: y_collected_true_one_hot})

for i, image in enumerate(images):
    # Plot the original image
    plt.subplot(121);
    plt.imshow(image.reshape(image_shape[0], image_shape[1]), cmap='gray');
    plt.title('Image');
    
    # Get the top k probabilities and the classes to which they correspond
    top_k = tf.nn.top_k(output[i], k=5)
    indices = top_k.indices.eval(session=session)
    values = top_k.values.eval(session=session)
    
    # Sort in ascending order
    idx = values.argsort()
    values = values[idx]
    indices = indices[idx]
    
    # Show a bar chart of the probabilities
    y_pos = range(0, 200, 40)
    
    plt.subplot(122)
    plt.bar(y_pos, values, align='center', alpha=0.5, color='red', width=20)
    
    plt.xticks(y_pos, get_sign_names(indices))
    locs, labels = plt.xticks()
    plt.setp(labels, rotation=90)

    plt.ylabel('Probability')
    plt.title('Probability of top 5 classes')
    plt.show()

Question 8

Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)

In [ ]:
 

Answer:

  • Softmax probabilities are shown above

  • The model gets one of the classifications right(The yield sign)

  • For 1 of the images(50kmph) the correct prediction is in the top 5
  • However, for the other 4 images, the correct classification is not even in the top 5 class predictions.

Question 9

If necessary, provide documentation for how an interface was built for your model to load and classify newly-acquired images.

Answer:

The following operations need to be performed on new images:

# Reading
images = read_images(test_files)

# Pre-processing
images = process_images(images)

# Re-shaping
images = np.ndarray((len(images), image_shape[0], image_shape[1], 1), buffer=images, dtype=np.float32)
x_collected = images

# Manual labelling
y_collected_true = np.array([16, 2, 4, 38, 34, 13])

# One hot encoding of labels
y_collected_true_one_hot = encode_labels(y_collected_true)

# Run Prediction
output = session.run(y_pred, feed_dict={x: x_collected, y_true: y_collected_true_one_hot})

Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.